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Quantum tomography requires repeated measurements of many copies of the physical 
system, all prepared by a source in the unknown state. In the limit of very many copies 
measured, the often-used maximum-likelihood (ML) method for converting the gathered 
data into an estimate of the state works very well. For smaller data sets, however, it 
often suffers from problems of rank deficiency in the estimated state. For many systems 
of relevance for quantum information processing, the preparation of a very large number 
of copies of the same quantum state is still a technological challenge, which motivates 
us to look for estimation strategies that jjcrforin well even when there is not much 
data. After reviewing the concept of miniiuax state estimation, we use minimax ideas to 
construct a simple estimator for quantum states. We demonstrate that, for the case of 
tomography of a single qubit, our estimator significantly outperforms the ML estimator 
for small number of copies of the state measured. Our estimator is always full-rank, 
and furthermore, has a natural dependence on the number of copies measured, which is 
missing in the ML estimator. 

Keywords: Quantum tomography; state estimation; minimax; maximum likelihood; 
Bayesian 

1. Introduction 

Tomography is the art of estimating the state of a system put out by a given source. 
For example, one might be interested in characterizing the polarization of a photon 
from a lascir source: or two parties in a communication protoc;ol want to know 
the state they jointly receive from a common source; or an experimentalist might 
want to verify that a source built in his lab to provide some target state is indeed 
meeting its specifications. The scope of tomography can be broadened to include 
parameter estimation, that is, to estimate a certain quantity of interest for some 
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operational task (the fidelity between output state and target state, for example, or 
the expectation value of some fixed observable on the state). In this work, however, 
we will only deal with the most often discussed case of estimating the full state. 

Tomography involves two steps: (i) the measurement of identical copies of the 
state; (ii) the conversion of the data collected from the measurement into an es- 
timator for the state. In the simplest case, the measurement step (i) involves the 
same measurement on every copy of the state. More generally, it can be adaptive, 
that is, the measurement to be made on the fcth copy can depend on information 
gathered from measuring the previous fc — 1 copies. In the estimation step (ii), the 
simplest method gives a point estimator^ which is a single state that represents our 
best guess of the identity of the true state. More generally, one can give a set of 
states compatible with the observed data that includes the true state with high 
probability. Such region estimators are known in classical estimation theory, and 
have appeared recently in the quantum arenaI30 In our work, we will discuss the 
simplest case of repeated (non-adaptive) measurements, particularly measurements 
with the property of being symmetric and informationally complete, and focus on 
the issue of providing a point estimator. 

The most popular procedure leading to a point estimator goes under the col- 
lective name of maximum-likelihood (ML) methods, first applied to quantum to- 
mography by Hradil.'^ ML methods prescribe as the point estimator the state with 
the largest likelihood of giving rise to the observed data, and there are numerous 
variations to this theme depending on the scenario in question (see Ref.Qfor a good 
review). ML methods are particularly attractive because they do not require the 
choice of a prior distribution, a problem that plagues alternative methods based on 
Bayesian ideas (see, for example, Refs. [s] and |6]). In the limit of a very large num- 
ber of copies of the state measured, ML methods work very well — the likelihood 
function becomes so sharply peaked around the true state that one requires little 
sophistication to make a good guess. 

However, for small sample sizes, ML methods are perhaps less well-motivated, 
and there is reason, as we will see in Section [231 to look for other methods in the 
estimation step. Besides, one should hardly expect that ML methods are the best 
choice for all scenarios, since its optimality is based on a particular figure- of- merit Q 
and whether this is a suitable figure-of-merit will undoubtedly depend on the task 
at hand. This motivates us to look beyond ML methods for alternative strategies 
appropriate for different tasks. 

"Here is a comment that will likely make sense to the reader only upon reading the remainder of 
the paper: The ML estimator can be shown to be optimal in terms of minimizing the average risk, 
where the averaging uses the prior distribution dfi = dpidp2 ■ . • dpn for a state characterized by 
probabilities {pk}^—i- The estimation error is quantified by a cost function that assigns a value 
only when the estimator and the true state are identical, and 1 otherwise (see, for example, Ref.[7|. 
Hence, even though the ML estimator requires no choice of prior distribution in its construction, 
in judging its efficacy, one still requires a choice of prior distribution to quantify its average 
performance over all true states. This point will be reiterated later in the text. 
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A class of estimation procedures that requires no arbitrary or subjective choice 
of prior distribution is the class of minimax methods. In a minimax procedure, 
one looks for an estimator, usually from within a specified class of estimators, that 
gives the smallest worst-case (over all true states) estimation error. This gives an 
optimality condition that holds regardless of the probability of occurrence of each 
true state. While such a "worst-case scenario" approach may be overly cautious for 
some purposes, it can be suitable and, in fact, necessary for tasks like cryptography 
where one would prefer to acknowledge ignorance rather than make a wrong guess. 

Minimax procedures are, unfortunately, notoriously difficult to implement, 
even for classical problems. This is hardly surprising since they involve a dou- 
ble optimization — first a maximization of the estimation error over all possible true 
states, followed by a minimization of this maximum over the class of estimators 
under consideration. 

However, if we employ the commonly used mean squared error to quantify the 
estimation error, a minimax estimator with particularly nice features is known for 
the problem of a if-sided classical die. While the minimax estimator for the quan- 
tum analog of this problem is not known, we demonstrate here a general procedure 
to obtain a point estimator for the quantum problem that retains most of the de- 
sirable features of the classical minimax solution. This estimator is not minimax 
in the set of all estimators, as is the case for its classical analog, but is minimax 
within a smaller class of estimators with mathematical structure motivated by the 
solution for the classical die problem. This quantum generalization of the minimax 
point estimator, despite being rather ad-hoc in its construction, performs remark- 
ably well in comparison to ML estimators for the qubit case investigated in detail. 
Furthermore, the estimator is easy to use as it requires no complicated numerical 
optimization. It can find utility as a good first guess for tomographic experiments, 
particularly if one only has access to a small number of copies of the state. Applying 
a similar procedure to adapt other known estimators for the classical problem to 
the quantum case might be equally fruitful. 

Our goal here is partly to review the use of minimaxity as a means of choosing 
an estimation procedure. This is, of course, well known in the classical estimation 
theory community. In the quantum context, however, while minimax ideas have 
appeared in the quantum state estimation literature (see, for example, Refs. 8pp0), 
they remain little explored. Here, we organize the ideas into a consistent programme 
(Section [2]), and contribute by presenting a simple estimator for quantum states 
motivated by minimax considerations (Section |3|. The geometrical properties of 
symmetric quantum measurements are discussed in an appendix, and two more 
appendices contain mathematical details. 



2. Minimax estimation 



We first review two types of estimators — maximum-likelihood and mean 
estimators — before leading up to the idea of minimaxity. We also review the well- 
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studied problem of the classical die, which serves two purposes: first, to define the 
notation and provide a concrete example for the application of the different esti- 
mation procedures; second, to provide guidance in the quantum problem studied in 
the next section. The reader is to note, however, that the ideas of state estimation 
discussed in this section are equally applicable to the quantum problem. In moving 
to the quantum arena, there are significant differences in the setup of the problem 
that complicate the application of the state estimation procedures, but the ideas 
behind each procedure remain unchanged. This is a reminder that the classical state 
estimation literature has much to teach us, even in the quantum context. 



2.1. The classical K-sided die 

Consider a iiT-sided die with faces labeled k = 1,2, . . . , K . The probability that face 
k turns up when the die is tossed is denoted by pk ■ Tosses of the die are described by 
the probability distribution {pk}k=iJ with Pk > and J^kPi^ ~ ^- Suppose we are 
given a die for which the probabilities are unknown, and we are allowed N tosses 
of that die to attempt an estimate of the pk values. 

Let us discuss the tomography of a iiT-sided die using language suitable for 
quantum state tomography. We write the state of a die with probability distribution 
{pk}k^=i SiS p = X^fe^i \k)Pk{k\, where ket |fc) represents face k turning up in a toss 
of the die. We can think of {|fc)} as a basis for the state space with an inner product 
defined such that {k\l) = Ski- A single toss of the die is then, in this language, a 
measurement in the basis 

We can describe this measurement formally as a probability operator measure- 
ment (POM) with outcomes Hi, 112, ... , . To define a POM, the operators Ilfc 
must be non-negative and normalized to unit sum, 

Hfc > for all A: , with ^nfc = I. (1) 

fe 

The measurement can be thought of as comprising K detectors, each corresponding 
to one of the POM outcomes 11^. The probability that the fcth detector clicks, if 
we have the input state p, is given by Born's rule 

Pk = tr{Ilkp}. (2) 

For the case of the if-sided die, the POM corresponding to a single toss of the die 
can be described using the POM {11^ = |fc)(A:|}, and pk is simply the probability 
that the face k turns up in a single toss of the die. 

To improve the efficiency of the tomographic measurement, one often chooses a 



POM that is symmetric (S). The properties of S-POMs are the subject of Appendix 
\K\ Here, we are content with considering S-POMs that have rank-f outcomes, in 
which case 



2 r 



tr{n,n,} = 1^ 



, K-d 



for all fc, (3) 
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where the dependence on k and I is in the Kronecker deltas only. In particular, for 
K ^ d, this covers the case of the classical die, for which this S-POM is informa- 
tionally complete (IC). One then speaks of a SIC-POM^ 

Given a SIC-POM, every state of the system can be written as 



K 

p^^Pkkk, (4) 

k=l 



where pk is computed via the Born's rule of ^ for the SIC-POM, and the AfcS 
are hermitian, unit-trace operators with trjllfcA;} = 5ki as their defining property. 
Specifically, we have 



_ {K-l)K 
^~ {d-l)d 



K-d 
{K - l)K 



(5) 



for the IlfcS of ([s]). For the K-sideA die, = life. Equation Q can be thought of 
as inverting Born's rule, that is, we can write down the state p that will give rise 
to the probabilities pk via Born's rule for the SIC-POM. The set of probabilities 
{pk} thus provides a complete description of the state p — this is the sense in which 
the SIC-POM is informationally complete. We will often use the notation p ^ {Pk} 
to denote this relation. We will also occasionally use simply p to denote the list 
{PijP2j ■ ■ ■ jPk}- Any set of probabilities {pk} is always understood to satisfy pk > 
for all k and "^kPk = 1; ^-^d we sometimes refer to the set as a probability 
distribution. 

The goal of a tomographic problem, classical or quantum, is to provide a reason- 
able estimator for the true state p, given the data from performing a chosen IC-POM 
on every one of N identical copies of the input state. To be concrete, in the subse- 
quent analysis, we represent the data from the N measurements as a sequence of 
clicks in the K possible detectors: = {ci, C2, . . . , cat}, where q S {1,2,..., K} 
is the detector that clicked in the Ith measurement. We can summarize the data by 
collecting together the number of clicks for each detector: {rii, n2, ■ ■ ■ , tik}, where 
Uk is the number of times the kth detector clicked in the N measurements. We will 
use the notation D^r {ni,n2, ■ ■ ■ ,nK} to refer to the summary of a particular 
sequence of measurement outcomes. Note that the data must satisfy X^fcLi "-fe — ^■ 

A point estimator p is a map from the set V = {I?i,I?2, ■ • • ,T^n, ■ ■ ■} of all 
possible data to the set S of all possible (physical) states. Here, denotes all 
possible measurement outcomes on N copies of the state. For the classical die 
problem, for example, consists of all possible sequences of faces revealed in N 
tosses. The set S consists of all states p — J2kPk^k where {pk} is a probabihty 
distribution. We denote the point estimator for data Djv as p{Dn), and denote the 
set of all point estimators, that is, all maps p : V S, hy S. 



''Here, we take the liberty to include the classical case under the general name of SIC-POM. In the 
classical case, the SIC-POM is simply a projective measurement — a von Neumann measurement. 
A projective measurement is IC for the classical case, though not for the quantum case. 
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2.2. The maximum-likelihood estimator 

A very popular approach to a point estimator is the maximum-hkehhood method. 
The ML method prescribes as the point estimator the state at which the hkehhood 
function for the observed data attains its maximum. The Ukclihood function — the 
probabihty that the state p ^ {pk} gives rise to the data Dn — is 

K 
k=l 

To find the ML estimator for the classical die problem, we solve the following 
constrained maximization problem: 

max C{Dn\p), 
p~{pfc} 

subject to ^Pfe = 1 with Pk > for all k. (7) 

k 

This gives the ML estimator pml = J2k (Pfe)ML ^fc with 

For large N, the ML estimator for the classical die is intuitive: From a frequen- 
tist's perspective, the long-run {N oo) relative frequencies i^k should approach 
the probabilities pfe. What about small iV? Suppose a coin (a "2-sided die") is tossed 
just once, and gives "heads" . Hardly anyone will put his money on the estimator 
Pk = n-k/N, which means setting phcad = 1, and ptaii = 0. This lack of confidence 
in the estimator is well justified if one considers the fact that, for Di ^ {1, 0}, the 
likelihood function is not very sharply peaked at phead = 1, and phoad values near 1 
have similar likelihood. Suppose we make more tosses, and always get heads. Then, 
we gain confidence in the estimator Phoads = 1 and ptaiis = 0, as is reflected by the 
likelihood function getting more and more sharply peaked at phcad = 1; see Fig. [l] 

Observe that the ML estimator in ([s]) depends only on the relative frequencies 
Vk, and not on iV, the total number of tosses made. For the above example where 
Dn ^ {A^, 0}, the ML estimator is always Phoads = 1 and ptaiis = for all N. 
Only the confidence (loosely quantified by the width of the likelihood function) in 
the estimator changes with N. In many situations, only the point estimator, and 
not the confidence interval associated with it, is carried forward into subsequent 
analysis. However, a statement that phcad = 1 if = 1 is clearly not of the same 
standing as saying phead = 1 after N = 10, 000 tosses. This invites us to look for a 
point estimator that itself reflects our changing level of confidence as N changes. 

Another peculiarity of the ML estimator is visible in Fig. [T] The point estimator 
is reported as a point on the boundary of the allowed values for Phoad- This corre- 
sponds to the statement that tails can never occur. In general, rank deficiency in 
the estimator — that is, there exists at least one pure state {ip) on which /5ml has no 
support, (V-'IpmlIV') = — says that a detector that projects into the rank-deficient 
sector can never click, a statement that cannot be justified with finite N. Yet, the 
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Fig. 1. Likelihood function for a 2-sidcd die (coin) with data Djv ~ {-'V, 0}, for N = 1,2,5,10 
and 100. 



ML estimator is frequently rank-deficient whenever N is of the order of K, such 
that there is non-negligible probability that at least one of the detectors has no 
clicks. This invites us to look for a point estimator that is full-rank for all finite N. 



2.3. Mean estimators 

An alternative to the ML estimator that takes N into account is suggested by Fig.[l] 
For small N , the likelihood is significant for a large region of phoad values around 
the maximum; for large N, the likelihood rapidly drops as we move away from the 
maximum. This suggests using the likelihood function as a weight to construct a 
point estimator. We weigh each state p of the system by its likelihood, given data 
Dpf, and perform an average over all states to obtain the mean estimator 

f d<j>{p)C{DM\p)p 

Pme{Dn) = , (9) 

J d<l>{p)C{DN\p) 

where d(f> is an integration measure that tells us how to perform a sum over states; dcf) 
should be non-negative on all physical states of the system, and zero elsewhere. We 
can require, in addition, that / d(f>{p) = 1 for interpretation of dcj) as a probability 
distribution. This is, however, not necessary and we only require that dcf) is not too 
pathological, so that the integrals in Q exist. 

A reader familiar with Bayesian methods will recognize that the use of a prior 
distribution dp{p), which encapsulates the experimenter's prior information about 
the probability of occurrence of each state p, fits within this framework of mean 
estimators. If one chooses dcj) = d/i, then by Baycs's theorem, d(l){p)C{Df^\p) is 
proportional to the posterior distribution dp{p\D]\j), that is, the probability for 
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state p given data Dj^. The mean estimator for this choice of dcj) is then simply 
the mean state for the posterior distribution. This particular mean estimator has 
a long history in Bayesian estimation, and is also sometimes used in the quantum 
literature. 

For us, the integration measure d(j) need not be chosen to represent our prior 
information about the identity of the true state, but is a functional parameter 
that we can adjust to satisfy desired optimality conditions. By varying d(/), we can 
describe a reasonable class of point estimators — the class of mean estimators — 
constructed as in ([9]), that is, 

5mE = {PME[d0]}. (10) 

For the case of a iiT-sided die, the mean estimator can be written more explicitly 

as 

K 

PME = ^ (P/c)me ^k, 

fc=l 

^\dp)S[l - Y,Pi] f{p)C{DM\p)pk 
1=1 



with (Pk) ME = 7 K ^ ■ (11) 



f{p)C{DM\p) 



Here, we parameterize the states of the die by their probabilities {pk\- The integra- 
tion measure is written explicitly as dcj){p) = {dp) 5{l — J2iLi Pi^ f (p) ^ where (dp) 
denotes the volume element dpidp2 ■ ■ ■ dpx, the delta function enforces J^iPi — 1; 
and /(p) is a non-negative function that we can choose to suit our needs. 

A natural symmetry in the if-sided die problem lies in the labeling of the dif- 
ferent faces a.s 1,2, . . . , K: A permutation of these arbitrary labels does not change 
the physical description of the die. This symmetry should be reflected in the choice 
of f{p) as an invariance under permutation of the label k. A particularly simple 
choice of / with this invariance is 

/(p)-(^nPfe^ with/3>0. (12) 
The resulting mean estimator for this choice of / is 

N + KI3' 

There is an alternate way of arriving at this estimator following ML ideas. 
Suppose we obtained data ^ {'ni,n2, ... n^, ... , nx}- We add "fake counts" to 
every detector, of an amount /?, so that the data becomes Djv./s ^ {ni + P,n2 + 
l3, . . . ,71^ + l3, . . . , nx + /?} and the total number of counts appears to be -I- Kj3. 



(Pfc)ME = Ar , r^a - (1^) 



Then, the ML estimator for this modified data is exactly that given in (13). This 
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estimator is sometimes referred to as the "add-/3" estimator, and is used as an ad- 
hoc procedure to avoid reporting an ML estimator that hes on the boundary. Note 



that the (3^0 Hmit of (13) corresponds exactly to the ML estimator of i8», but 



one cannot use the (3 — version of ( 12 ) in (111 



2.4. Assessing the quality of an estimation procedure 

How well does a particular estimation procedure perform? To answer this question, 
we need to define a figure-of-merit that quantifies how far from the true state an 
estimator is. Given data Dn, we compute the error in our guess p{Dn) of the true 
state, 

estimation error = E{p, p, Dn) = dist(/5(£>jv), p), (14) 

where dist(p, p) is often chosen to be a formal distance between two states p and 
p, like the trace distance or the Euclidean distance. More generally, it is a function 
that assigns a (non-negative) "cost" whenever p ^ p. 

Of course, we are not always going to get the data Dn every time we perform 
tomography. Instead, one should assess the efficacy of the estimation procedure 
for p by averaging the estimation error over all possible data D^v that one could 
have obtained. Using terminology standard in estimation theory (see, for example. 



Ref. 11), this gives the risk 

Rn{p.p)= ^Dn\p)E{p,p,Dn). (15) 

The risk Rjs; (p, p) still only tells us how good the estimator p is for a given true 
state p. But, we have to judge the merits of an estimation procedure while not 
knowing the identity of the true state (hence the need for tomography). If there 
exists a p such that the risk Rf^[p,p) for all true states p is smaller than that of 
any other estimation procedure, this p will clearly be the best procedure to use. 
However, an estimator with such miraculous properties is not likely to exist. 

Instead, suppose we only ask that the estimator performs well "on average" over 
the true states. For example, a p that gives a large risk for a particular state po but 
small risk values for all other true states can be considered a good estimation pro- 
cedure as long as the probability that po is indeed the true state is tiny compared 
to other states. This requires some knowledge about the probability distribution of 
the true states, that is, the prior distribution dp(p). If we know the prior distri- 
bution, a figure-of-merit that can be used to assess an estimation procedure is its 
average performance over the true states, that is, the risk weighted by the prior 
distribution, 

Fn{pAi^)^ j Mp)Rn{p.p)- (16) 

We refer to F/v as the average risk. This includes the case where one does know 
the identity of the true state to be some state r: d/x(p) = dp 5{p — t), so that 
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Fpf(p,d^) — Rn{p,t). The case where p performs poorly only on a single state 
po out of a possible (discrete) set of states S = {pi}f^Q involves using the prior 
p{pi) = Qi for Pi £ S and otherwise, with qq <C qi^o- A large risk for po is 
suppressed in F by a small enough value of go- 

Given a prior distribution, the average risk quantifies the efficacy of the estima- 
tor p. To find the best estimation procedure among the set iS of all possible ps, one 
minimizes the average risk: 

mmFN{p,dp) (Bayes). (17) 

An estimator (not necessarily unique) that minimizes the average risk is known 
as a Bayes estimator for the prior distribution dp. Note that this requirement of 
choosing a prior to assess the efficacy of an estimation procedure in terms of average 
risk applies even for schemes like ML methods which, by themselves, do not require 
a choice of prior or integration measure. 

Bayes estimators are well studied in the state estimation literature. The follow- 
ing faci[^ relates mean estimators to Bayes estimators, which we will find useful 



later (for a self-contained proof of this fact, see Appendix B) 



Fact 1. Suppose we choose the square of the Euclidean distance — the 
squared error — to define the estimation error: 

E{p,p,DN)=dist{p{DN),p) = tr {{p{D n) - p^y (18) 

Then, the unique Bayes estimator for prior distribution dp is the mean 
estimator PME[<ip]- 

The prior distribution dp{p) encapsulates our knowledge, not of the source at 
hand, but of the preparer of the source. Imagine that the preparer, say Alice, has 
promised to provide a source that puts out identical copies of a state p. Alice is, 
however, free to choose which particular state p is. Our information about the prob- 
ability that Alice provides us with a source that puts out state p is given by dp{p). 
The data Dn collected from measuring a single instance of the source provided by 
Alice cannot yield us any information about dp{p) (other than excluding states of 
the source that could not have given rise to D^)- dp{p) must refiect prior knowl- 
edge about the preparer gathered from previous interaction with different sources 
provided by Alice. 

In most tomographic scenarios, such prior knowledge is absent, and it seems de- 
sirable to say "we don't know" . Converting the heuristic notion of "we don't know" 
into a rigorous "uninformative prior" is, unfortunately, fraught with difficulties. 
For example, it would seem natural to assign equal probabilities to all states, in 
the absence of knowledge of which states are more probable. This is not a problem 
for a discrete set of states labeled by a discrete label i — it simply says that all the 



'^This is Corollary 1.2 in Chapter 4 of Ref.[TT] 
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qiS are equal. For a continuous set of states, which we parameterize using some 
continuous parameter x, equal probability of occurrence means setting d/i(p) — dx. 
However, there is no unique way of parameterizing the states, and equal probability 
in one parameterization in general does not translate into equal probability in a 
different parameterization. One often-used way to deal with this is to give up the 
idea of equal probabilities for all states, and ask for an uninformative prior with the 
property of parameterization invariance, for a relevant class of parameterization. 
An example is the Jeffreys prior (see, for example, Ref. [7| p. 181 for a discussion), 
which is scale-invariant, that is, invariant under reparameterization x — > a;™ for 
some power m. 



2.5. Minimaxity 

Assessing an estimator according to its average risk and using a Bayes estimator for 
d/i only works well if the true distribution describing Alice the preparer is indeed d/i. 
Given that we do not usually know the prior distribution, and since even choosing 
something like an uninformative prior is far from straightforward, using a Bayes 
estimator for some choice of d/i seems poorly justified. Minimax approaches offer a 
way out of this. 

Instead of using the average risk as a figure-of-merit, an alternative is to use the 
worst-case risk, that is, the maximum risk (over all possible true states) of using 
estimator p. This does away with the requirement of choosing a prior distribution 
to perform the averaging of the risk. The best estimator is found by minimizing the 
worst-case risk: 

min max i?jv (p, p) (minimax). (19) 
pes P 

An estimator (not necessarily unique) that minimizes the worst-case risk is known 
as a minimax estimator. 

Carrying out this double optimization to find a minimax estimator is, of course, 
non-trivial. There is, however, a faci0 that can sometimes simplify the search for a 



minimax estimator (see Appendix C|for a self-contained proof): 



Fact 2. An estimator with constant risk that is also a Bayes estimator for 
some prior distribution d/i is a minimax estimator. If the estimator is also 
the unique Bayes estimator for some d/t, then it is the unique minimax 
estimator. 

Relating minimaxity to Bayes estimators is useful because much more is known 
about Bayes estimators than minimax estimators. For example. Facts [l] and [2] tell 
us that in scenarios where the squared error is the suitable figure-of-merit, the 
unique minimax estimator can be found by looking for a mean estimator p S Sme 
with constant risk (if it exists). We will make use of this in the next section. 



^This is Corollary 1.5 in Chapter 5 of Ref. ml 
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2.6. The minimax estimator for the K-sided die 

For tomography with SIC-POMs, the squared error has the simple form 



.{{p{D.) -py}= ^j^-ij^ YiP^-Pkf, (20) 

where p = J2kPk-^k and p = J2kPk-^k - The squared error is proportional to the sum 
of squares of the difference in the probabilities {pk} and {pk}- The corresponding 
risk is thus nothing more than the mean squared error (MSE) commonly used in 
classical state estimation. 

It is easy to work out the expression for the MSE for the class of mean estimators 
for the if-sided die given in (13). In particular, there exists a special value of f3 (as 
used in (13l) such that the MSE is independent of p, that is, constant over all 
states, 



Using Facts [l] and [2) we know that the mean estimator given in ( |13[ ) with this value 
of /3 is also the unique minimax estimator for the ii'-sided die problem, with the 
choice of the squared error as the figure-of-merit. This minimax property justifies 

objectively the choice of the integration measure di/) = (dp) f{p) — (dp) ( Hfe Pk ) 

with /? = VN/K. 

For this choice of /3, we can write the minimax estimator puM = J2k (^''i;)MM -^fc 
for the X-sided die in a form that exhibits its structure clearly, 

(Pfc)MM = ^k^N, 

with aiq = =, hjsi = =. (22) 

The parameters aM and hN depend only on N and satisfy the relation an + bM = 1- 
Observe that approaches zero as N gets large, while bN approaches unity, for 
which the minimax estimator approaches the ML estimator {pk)ML ~ ^k- For N 
small, qn is significant, and the two estimators differ. 

Observe that, unlike the ML estimator, this minimax estimator is always full- 
rank for finite A'', since for any pure state of the system. 



k 

We can also compute the purity of the minimax estimator, 

tr{PMM} = J2 (PkfuM = +b%[Yj''k- J^] 

k=l \ k ) 

<1- (l-;^)(l-^>^), (24) 
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which is strictly less than 1 for finite N . The equality is attained when the data 
is such that all clicks are in a single detector. Having the purity bounded away 
from 1 is immediately obvious from the fact that the estimator is always full-rank. 
However, the expression for the purity reveals more interesting features. For K fixed, 
as N increases, the bound on the purity increases towards 1, and the estimator can 
approach a pure state, expressing our increasing confidence in claiming a definite 
pure state as we gather more data. Also, for N fixed, the purity of the estimator 
decreases as K increases. This is also intuitive: If K is large, we would require more 
data to convince ourselves that certain detectors will never click. 

The minimax estimator for the iiT-sided die problem circumvents both com- 
plaints we had about the ML estimator. The minimax estimator itself has a depen- 
dence on TV, and furthermore is never rank-deficient for finite N . In the remainder of 
this paper, we would like to adapt this minimax estimator to the quantum problem, 
while still retaining these two desirable properties. 



3. The quantum problem 



In this section, wc turn to the tomography of a quantum system. We begin by point- 
ing out the differences between the classical and the quantum problems (Sections 



3.1 and 3.2). These considerations provide clues to adapting the minimax estimator 



of the classical die problem to the quantum context (Sections |3.3|and 3.4 1 



3.1. SIC-POM for a quantum system 

In moving from the classical to the quantum problem, the first difference we meet is 
that the IC-POM that one can perform on the quantum system for full tomography 
is non-unique. This has to do with the fact that there is no unique preferred basis 
such that all quantum states are diagonal in that basis. We can, however, still 
choose to make use of a SIC-POM which offers efficiency advantages over other 
choices of IC-FOM.E^l Related to the lack of a unique preferred basis is the fact 
that, unlike the classical case, the POM outcomes of a SIC-POM are no longer 
mutually orthogonal: tr{HfcH;} 7^ for fc 7^ L According to Appendix A a SIC- 
POM for a quantum system has K = (P, and the A^ operators of ([5|) take the simple 
form of Afe ~ d{d + l)Ilk — 1. In our discussion below, we will only consider such a 
SIC-POM for tomography of a quantum system. 

For a single qubit, that is, a two-dimensional quantum system, the SIC-POM 
is the tetrahedron measurement^^ with POM outcomes proportional to projec- 
tors onto the legs of a regular tetrahedron inscribed within the Bloch sphere. The 
tetrahedron measurement is non-unique in that the orientation of the tetrahedron 
within the Bloch sphere is left to the choice and convenience of the experimenter. 
Nevertheless, given a particular orientation, the POM outcomes of the tetrahedron 
measurement can be written in terms of the Pauli vector operator (j = ((Ta,, Cj,, cr^) 
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as 

Jlk = \{l + ak-a), fc = 1,2,3,4, (25) 

where each ~ak is one of the four legs of the tetrahedron. The tetrahedron vectors 
Ilk satisfy ~ak ■~ai = — |. Their linear dependence is captured by the facts that 

they sum to zero, X^fc^fc ~ ^' ^^'^ complete, | J2k~'^k0^k = 1- The probability 
of obtaining the fcth outcome 11^ for a qubit state p ~ ^(l + ^ • ct) is given by 
Pk = tr{pnfc} = i (l + a'fc • «■) , and Afc = 611^ - 1. 

Before we describe the quantum problem further, let us make a side remark 
regarding the choice of figure-of-merit. For the classical die problem, while the 
state of the die can be gathered into a single operator p — ^^Pk-t^k, P is but a 
book-keeping device for the probabilities {pk} one is truly concerned with. The 
squared error, which directly measures how much the estimated probabilities differ 
from the true probabilities, is thus a natural way to quantify the estimation error. 
Of course, one can use the Euclidean distance (rather than its square), but taking 
the square has analytical advantages. One just needs to note that doubling the 
difference in probabilities quadruples the squared error. 

For the quantum problem, however, things are different. If the purpose of the 
quantum tomography is to predict the outcome of a future measurement of the same 
SIC-POM used to perform tomography, then the pfcS are again the only quantities 
of relevance, and the use of the squared error is, as in the classical case, rather 
natural. However, if one's goal for tomography is to predict outcomes of a different 
measurement that can yield information complementary to that provided by the 
tomographic SIC-POM, then quantifying the estimation error in terms of differences 
in the probabilities {pk} may not be suitable. Instead, one might choose to use, for 
example, the fidelity or the trace distance between the estimator and the true state]^ 
Nevertheless, for calculational ease, in the remainder of the paper, we shall continue 
to use the squared error as our figure-of-merit. 

The squared error is also the (square of the) Euclidean distance between the 
estimator and true state when viewed as vectors in the Hilbert-Schmidt space. We 
emphasize that there is no single figure-of-merit that is suitable for all situations, 
but it should be chosen in accordance with the task at hand. 

3.2. Physicality constraints 

Consider any probability distribution. Does {pk} always correspond to outcome 
probabilities that can be obtained by applying Born's rule for a SIC-POM to a 
physical state of the system? Equivalently, one can ask whether p = ^j^Pk^k, for 
the SIC-POM we are considering, describes a physical state of the system, that is, 
p has unit trace and is non- negative, for any probability distribution {pk}- 

^Notc, however, that for the qubit problem, the square of the trace distance is equal to the squared 
error. 
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To answer this question, let us examine the quantity = X^fc^*!- Since 
J2kPk — Ij minimum value of is attained when all the pkS are equal. This 
gives 

/ > ^, (26) 

true for any probability distribution {pk}- 

What about the maximum value of p^? Any physical state, whether quantum or 
classical, must satisfy tr|p^} < 1. Using p — J^kPk^k, and writing = allk+b for 
a SIC-POM (a and b can be deduced from ([s])), it is easy to show that trjp'^} < 1 
implies < (1 — b)/a. A i^-sided classical die problem has a ~ 1 and 6 = 0, which 
gives 



K 



< (P')K-Sided die < 1- (27) 



This is satisfied for any probability distribution {pk}- The physicality requirement 
that tr{p^} < 1 does not constrain the pkS further. In fact, for the classical problem, 
p = J2kPk^k is physical for any probability distribution. 

For the quantum problem with a SIC-POM, however, we have a different situ- 
ation. In this case, a — d{d + 1) and b = —1, which gives 

1 2 2 

K -(P ) quantum - d{d + I) ' 

The right side of the inequality is strictly less than 1 for d > 1. For example, the 
qubit problem with the tetrahedron measurement has the physicality constraint 

4 < (^')qubit < 3- (29) 



Only probability distributions {pk} that obey (28) can correspond to a physical 
quantum state. For example, {pi = 1,P2,3.4 = 0} does not correspond to a physical 
qubit state. This is a direct manifestation of the non-orthogonality of the POM 



outcomes comprising a quantum SIC-POM. Note that ( 29 ) is also sufficient for the 



qubit problem: Any probability distribution {p^} satisfying (29) corresponds to a 



physical qubit state. For higher-dimensional quantum system, there are additional 



physicality constraints, apart from (28), that {pk} must satisfy. 

Additional physicality constraints on {pk} mean that the expression for the 
mean estimator for a quantum state is not just the expression for the classical die 



problem given in (11). For example, in the qubit problem, the mean estimator is 
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K 



PME = (Pk) ME 
fc=l 

^ (dp) - ^ U Q - f{p)C{DN \p) Pk 

With (pk)^^^—^ \ '=\ , , ^ • (30) 



f(d.).(i-E.).(i 



Here, r]( ) is Heaviside's unit step function: ri{x) = if a; < 0, and ?y(x) = 1 if 



X > 0. The step function enforces the upper bound in (29). With this expression, 
one can define an optimization procedure over aU possible f(p) functions to look for 
a minimax (for example, using the MSE as the risk) estimator for the qubit. This 
is, of course, difficult to perform. What would be simpler, is an f{p) for which the 
MSE is constant for all qubit states, which would then give a minimax estimator 
according to Facts [T] and [2j Unfortunately, our preliminary attempts at this yielded 
a function f{p) that flies in the face of common sense. 



3.3. Adapting the classical minimax estimator to the quantum 
problem 

Instead of tackling the difficult problem of finding a minimax estimator, we can try 
to build a simple estimator for quantum states by adapting the minimax estimator 
from the classical die problem. For the quantum problem, the ML estimator has 
the same problem of having no dependence on N as well as suffering from rank 
deficiency. The ML estimator for the quantum case tells us to report, as the point 
estimator, the physical quantum state at which the likelihood function attains its 
maximum. For the qubit problem discussed above, this corresponds to looking for 
the maximum of £(Z?jv |p) subject not only to the usual constraint of J2k Pk — 1; but 
also the additional constraint that J^k Pfc — |- Whenever the data Djy are such that 
Tlik^k — I' estimator is unmodified from the classical case: {pk)ML ~ 
when the inequality is violated, the ML estimator gives a state on the boundary 
of the Bloch sphere (that is, a rank-deficient state) such that (Pfc)ML ~ \- 
goal will be to adapt the minimax estimator from the classical die problem in such 
a way that we arrive at an estimator that has a reasonable dependence on N and 
does not suffer from rank-deficiency. 

For the moment, let us put aside the desire for a full-rank estimator, and focus 
on establishing a point estimator that is always physical for the quantum problem. 
Suppose we begin with the expression for the (pfe)MMS for the X-sided die problem 



in ( 22 1 , which we rename here as 



Pkfi = -^a-N + ^kbN- (31) 
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We use these probabilities to construct an estimator for the quantum problem with 
a iT-outcome SIC-POM in accordance with Q, 



(32) 



We emphasize that the AfeS in (32) are for the quantum SIC-POM. 

Let us examine the quantity for {pk,o}, which we will denote as Pq. This was 



computed previously in ( 24 ) , which can be rewritten as 

1)2 



d{d + l) 



1 



1 



2d 



[d 



(1 + 1/ViV) 



(33) 



Observe that this inequality is weaker, for iV > 1 than the physicality constraint 

j52 < 2 

^0 - d{d+l) 



in (28), necessary for po to be a physical state. This means that there 



always exist data sets (for example, all but one detector have zero clicks) for which 
Po is not physical and, therefore, fails to be a valid estimator for the quantum 
problem. 

Nevertheless, observe that data such that all but one detector have zero clicks 
occur only with a small probability for values of N that are not too small. If the 
minimax estimator is physical for most data that we are likely to encounter, then 
there is some hope that this estimator can still work well for quantum states. After 
all, the quantum problem with a X-outcome SIC-POM looks very similar — from 
the perspective of the outcome probabilities — to the classical iiT-sided die problem 
as long as we are away from the boundary where the physicality constraints come 
into play. 

Suppose we perform a "correction" to the estimator po whenever it is unphysical, 
by admixing just enough of the maximally mixed state to make the overall mixture 
physical. We take this new mixture to be the estimator 

p = (1 - A)po + 3 



^Pk^^k, 



with = (1 - \)pkfl 



A 

d2' 



(34) 



where A > is to be chosen as small as possible such that p is a physical state. For 
the qubit problem, one can be more explicit: If po is a physical qubit state, A = 0; 
otherwise, A is chosen such that = J2kPk — ^- follows that A, for the qubit 
problem, is 



phy , 



1 - 



\ 



\^ J phy 4 



E 



1 



(35) 



Here, ( T^^i^^ ) is the largest value of such that the data will give a physical 

^ / phy 



18 H.K. Ng and B.-G. Englert 



po, that is, 



\ ^ ,2 



k 



1 1 



as implied by ( |24| ). 

Equation ( |34[ ) provides a simple prescription for converting any estimator po 
from classical problems, not necessarily the minimax estimator we have used here, 
into an estimator for the quantum problem. For example, one can get a good approx- 
imation to the ML estimator this way. Suppose we ignore physicality constraints 
and look for the ML estimator for given data subject only to the constraints 
that the pkS are non-negative with unit sum. This is exactly the ML estimator 
if the problem is classical. We take this classical ML estimator for po, and apply 



the prescription given in (34). This gives an estimator that is very close (for ex- 
ample, in terms of fidelity) to the one obtained from the ML scheme where one 
performs constrained maximization (taking physicality constraints into account) of 
the likelihood function. 

To demonstrate the effectiveness of the estimator given in ( [34] ), we plot in Fig. [2] 
the maximum and minimum (over all possible input states) risks — measured using 
the MSE — for our estimator in the qubit case (labeled in Fig. [2] as 'min, max risk 
(e^v — 0)'). For comparison, we also plot the corresponding risk values for the ML 
estimator (labeled in Fig. [2] as 'min, max risk (ML)'). Observe that the maximum 
error for our estimator is significantly smaller than that for the ML estimator, 
indicating a step closer towards a true minimax estimator. For N not too small, 
note also the much smaller difference between the maximum and minimum risk 
values for our qubit estimator as compared to the case for the ML estimator. This 
near-homogeneity of risk values over all states is inherited from the original classical 
minimax estimator that has constant risk. Risk homogeneity is attractive since it 
reflects an equal treatment of all input states, without having to implement the 
subjective construct of a uniform prior for a continuous set of states. 



3.4. Modifying the estimator to be full-rank — a minimax estimator 

In correcting the minimax estimator from the classical problem for physicality in 
the quantum case, we have lost the feature that the resulting estimator is always 
full-rank: Whenever po is unphysical, we correct it by choosing A just large enough 
to exactly cancel the most negative eigenvalue of po, and so to give a non-negative 
state with (at least) one zero eigenvalue. In this section, we attempt to remedy this 
rank deficiency using a minimax approach. 

Let us focus on the qubit problem with the tetrahedron measurement. We con- 
sider the same estimator as before in (34) with d = 2. Now, rather than choosing 
A such that the physicality constraint on jP is saturated (= |), we choose A such 
that we saturate the constraint except for an overall factor of (1 — e^r), for some 
parameter ejv > 0. More precisely, we set A = whenever pg < |(1 — ejv); otherwise. 
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Fig. 2. The solid curves plot the minimum and maximum risk (over all qubit states) for the 
optimal value of ejv- The dash-dotted curves are the corresponding values for setting ejv = for 
all values of A'^. The dotted curves correspond to the risk values for the ML estimator, and the 
curves with a circular marker give risk values for ML modified by an ejv parameter as described 
in the text. 



A is chosen to ensure 
as an equation for A, 



i(l — Eat). The latter case can be written more expUcitly 



N 



1 -4e 



N 



12 



(37) 



What remains is to choose the value of e^r. For this, we make use of a minimax 
procedure: Find the best value of e^v by minimizing the worst-case risk, that is, 



min max Rn {p, p) , 

ejv>0 p 



(38) 



where p is the estimator constructed from ( 34 1 with A chosen (when necessary) 



to satisfy (37 1. Equations (34) and (37) together define a class of estimators iSg 



parameterized by ejv- The solution of the optimization problem stated in (38) is a 
minimax estimator in the restricted class S^^ of estimators. 

Figure [3] reports the optimal values of e^r as a function of iV, with p^ defined as 
in (31 ) and ([32]) and restricted to the qubit case (labeled in Fig.|3]as 'Optimized e^r 
for our estimator'). The performance of the estimator with the optimal value of e^y is 
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Fig. 3. Optimized value of e]v for the qubit problem for different values of A'^, with po defined as 
in and l|32[l, as well as for the ML estimator modified by an e^v parameter. 



plotted in Fig. [2] (labeled in Fig.[2]as 'min, max risk'). Observe that as N grows, the 
difference in performance between optimizing the value of ejv and choosing ejv = 



(that is, the estimator discussed in Section 3.31 rapidly diminishes. If desired, for 
practical convenience, one can set =0 for iV > 100. That cn approaches as 
TV increases is particularly rewarding because it is in line with our intuition that 
for small iV, we have little evidence that can support reporting a point estimator 
that is close to a rank-deficient state; however, as N increases, we gather more and 
more data and gain confidence in reporting a state that is closer and closer to a 
rank-deficient state, as described by our estimator with cn approaching zero. 

For comparison, we have also plotted the performance of the ML estimator, 
with the modification that one restricts the domain of the maximization of the 
likelihood function to states such that < |(1 — ejv), where is again chosen 
via the same minimax procedure as above. This simple modification removes the 
problem of rank-deficiency of the usual ML estimator. In fact, as can be seen from 
Fig. [2] (line labeled 'min, max risk (ML with cat)'), it significantly improves the 
maximum risk for the ML estimator, although it does not do nearly as well as the 
estimator discussed above. 

Our approach to a restricted minimax estimator can be extended beyond the 
qubit case and beyond using a po that comes from the classical minimax estimator. 
As mentioned in Section |3.3[ one can begin with one's favorite classical estimator 
and admix enough of the completely mixed state to ensure physicality of the re- 
sulting estimator. To fix the rank-deficiency problem, one can then use a similar 
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minimax procedure as in ( 38 ) to find the best estimator that avoids the physicahty 
boundary. In the qubit case, a single parameter cn was sufficient to delineate the 
physicahty boundary and characterize the relevant class of estimators. For higher 
dimension, physicahty constraints are more complicated (and, in fact, are often 
not well understood), and one would typically require more than one parameter to 
define the analog of iSe„. Nevertheless, the same minimax procedure is applicable. 

Variants of our estimator are also possible. For example, one can treat both e^r 
as well as (with = l — b^) in ( 31 ) as parameters that we choose in a minimax 
fashion. Another variant, for the case of the tetrahedron POM for the qubit, suggests 
itself when we examine (37) which determines A whenever pQ > ^(1 — eat): The 
choice bff — 1 — 4ejv gives a particularly simple value for A that depends only on 
the relative frequencies Vk, but not on the parameters bN and e^f . One then performs 
minimax optimization over e^v only. Both variants give results very similar to our 
estimator above for the case of the tetrahedron POM for the qubit. 



4. Conclusion 

We demonstrated a simple procedure for adapting the minimax estimator for the 
classical die problem to the quantum case of a single qubit with the tetrahedron 
measurement. We obtained an estimator that inherited desirable properties from the 
classical version: (i) It is always full rank and contains a reasonable N dependence; 
(ii) it has much smaller maximum risk, as measured by the mean squared error, 
compared to the popular ML estimator; (iii) it gives nearly constant risk over all 
states and hence treats all possible states in a fair manner. 

The procedure of admixing a sufficient amount of the completely mixed state 
to obtain a physical and full-rank estimator can be applied to any estimator ap- 
propriate for the analogous classical problem. For typical data and most states, 
the classical estimator is usually physical; it is only the rare case that requires a 
physicahty correction. This automatically ensures that the resulting quantum es- 
timator will inherit most of the properties of the classical estimator. One can, for 
example, do this for estimators for the classical die problem that are minimax for 
other risk functions (for example, based on relative entropy). The procedure is also 
applicable beyond the qubit case and also beyond a SIC-POM. For higher dimen- 
sions, the physicahty constraints will involve more inequalities that the probabilities 
{pk} must satisfy, but can, in principle, be imposed as additional constraints for 
the choice of the admixing parameter A. In every case, a minimax procedure can 
be used to choose parameters like ejv to avoid the boundary. Note also that, for 
problems with an unusual symmetry, one can in fact consider admixing not the 
completely mixed state but some other suitable reference state. 

Given the simplicity of this estimator, we believe it will find much utility in 
tomographic experiments as a first-cut point estimate of the unknown state. Future 
work exploring the effectiveness of this procedure for other estimators, risk func- 
tions, and higher dimensions can also be potentially interesting. Progress towards 
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general minimax estimators following the programme set up in this paper will also 
certainly be of importance to quantum tomography. 
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Appendix A. Geometry of S-POMs 

The K outcomes 11^ of an S-POM for a d-dimensional system obey 

1 



tr{n,nfe} = 
tr{nfe} = 



d_ 
K 
d_ 



wS 



•jk 



K -1 



(1 



Ojk, 



with 



K 



< u; < 1. 



(A.l) 



(A.2) 



The lower bound applies when the IlfcS are multiples of the identity, which is a case 
of no interest; the upper bound applies when the outcomes have support in pair- 
wise orthogonal subspaces. If the outcomes are (subnormalized) rank-r projectors, 
we have w = d/{rK) with 1 < d/r < K. Of particular importance is the rank-1 
situation, for which 



„ 9 d „ 

= -lik, 



trjn.nfe} = 



if2 



5]k 



K -d 
{K-l)d 



(1 - Sjk) 



(A.3) 



hold. 



The set of traceless hermitian operators constitute a real (d^ — l)-dimensionless 
vector space that we endow with the Hilbert-Schmidt inner product 



A- B = tr{AB} for ^ A, = B, tr{A} = ti{B} = 0. 



(A.4) 



Since the operators Hk — l/K are in this vector space, we can state (A.l I as 



Hi 



1 

K 



d wK -1 
K K 



Sjk - 



1 



-ii-s,k) 



(A.5) 



In conjunction with J^ki^k ^ ^/K) = 0, this tells us that the vectors 11^ — 1/K 
define a flat K-edged pyramid, if we employ the terminology of Ref. [14] In the 
rank-1 situation of (A.3), the prcfactor in (A.5) is {d — l)d/K^. 

In view of this geometrical property of the S-POM, there can be at most 
outcomes. Indeed, the S-POM is IC for K — d^, but not when K < d^, and there 
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are no S-POMs with K > (P. In an alternative way of reasoning, we represent the 
vectors life — by the columns of the K x K matrix 



l{wK-l)d 
{K - l)K3 



K -1 -1 
-1 K-1 



-1 
-1 

K- 1 



(A.6) 



and note that this matrix has rank K — 1, which implies that the K vectors 11^ — 
span a {K — l)-dimensional subspace. 

Regarding the statistical operator p, we note that p — l/d is hermitian and 
traceless, and so are the operators Afc — l/d that appear in Q, 



K 

k=l 



P.A„ = ^ + ^(pfe--)(A, 



k=l 



(A.7) 



where either pk — 1/ K pk or A^ — l/d —> A^ is a permissible replacement, but 
not both. The defining property of the A^s, namely trjlljAi;} = 6jk or 



1 

K 



Ak 



implies their standard form, 
Afc = 



K- 1 
" K 

L {K-1)K 
i ^ {wK-l)d 
{K- 1)K 



K - 1 



(1 - S,k) 



(A., 



n 



-,Uk 



1 

(1 - w)K 



(A.9) 



{wK-l)d {wK~l)d' 

If the S-POM is not IC {K < d^), the A^s are not uniquely determined, because 
there is then the option to add a traceless hermitian operator on the right-hand side 
of (A.9) that is orthogonal to all K vectors 11^ — 1/if. It follows that the statistical 
operator p of (A.7) is not unique unless the S-POM is a SIC-POM. What is unique, 
however, is the part of p—l/d that resides in the (K — l)-dimensional subspace 
spanned by the vectors IIj. — 1/K. 

For the standard Aj.s of (A.9), the vectors A^ — l/d make up the same flat 
pyramid as the vectors Ilk — l/K, except that the edges have different lengths. 
More specifically, we have 



Ak--.] = 



{wK - l)d 



Sjk - 



1 



(1 - S,k) 



and 



K 

y/{wK- l)d 



Hz 



1 



y/{wK -l)d 



1 



(A.IO) 



(A.ll) 



are the edge vectors of the generic pyramid with unit-length edges. 
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In the qubit case {d — 2), the {<P — l)-dimensional real vector space of traceless 
hermitian operators is isomorphic to the three-dimensional cartesian space in which 
the Bloch ball is embedded. Rank-1 outcomes are of the form 

nfc = -^(l+efc-^) (A.12) 

with K = 2 for the von Neumann measurement, K = 3 for the so-called trine mea- 
surement, and K = 4 for the tetrahedron measurement of (25). The "e^s are unit 
vectors, with 



efc 



(A.13) 



stating how the inner product in the operator space is related to the scalar prod- 
uct of three-dimensional vectors. When representing the l?feS by three- component 
columns of cartesian coordinates, possible choices are 



ei 62 







1 -1 



for K = 2, 



ei 62 63 



1 



and 



61 62 63 64 



1 

7i 



1-1-1 1 
-1 1-1 1 
-1-1 1 1 



2 -1 -1 
-1 2 -1 
-1 -1 2 

for K = 



for K = 3, 



(A.14) 



In each case, one easily confirms that J^k = and 



K-l 



(1 - 5,k), 



(A.15) 



as implied by (A.13) with (A.5). 



Appendix B. Proof of Fact [T] 

Fact [1} Suppose we choose the square of the Euclidean distance to define the 
estimation error: 

E{p, p, Dn) = dist(/)pA,), p) = tr{ [p[Dn) -pY). (B.l) 
Then, the unique Bayes estimator for prior distribution dp is the mean estimator 

PM■E[<^^A■ 

Proof. Consider any estimator p £ S. Inserting = —pME[<ifj] + PMEidp] into the 
squared Euclidean distance, the estimation error can be written as 



E{p,p,Dn) = tT{[p- pme{Dn)? + [pme{Dn) - p{Dn)? 

- 2[p - Pme{Dn)][pme{Dn) - KDn)]}- (B.2) 
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The average risk can then be computed as 

FNip,dti)^ f dfiip) ^ /:p^|p)tr{[p-pMEpAr)]'} (B.3) 
+ j Mp) Y1 ^Piv|p)tr{[pME(^iv)-/5pA')]'} 

DnEVn 

-2 I d^i{p) J2 CiDN\p)tr{[p- PME{DN)][pMEiDN)- p{Dn)]}. 



The third term is zero, by definition of /5ME[d/^]. The first term does not depend on 
p and just gives a fixed constant value. To find the Bayes estimator, we thus solve 
the optimization problem 

mmfdfi{p) ^ /:(i^Ar|p)tr{[/5ME(i^iv)-/5piv)]'}, (B.4) 
for which p — /5ME[d/x] is clearly the unique solution. 

Appendix C. Proof of Fact [2] 

Fact [2} An estimator with constant risk that is also a Bayes estimator for some 
prior distribution d/i is a minimax estimator. If the estimator is also the unique 
Bayes estimator for some d/i, then it is the unique minimax estimator. 

Proof. Suppose pe is a Bayes estimator for d/i with constant risk, that is, 
Rn{p,Pb) = Rn,b for all p in S. pB satisfies 

/ dp{p)Rpf{p,pB) = Rn,b = maxi?7v(p, Pb)- (C.l) 
Consider another estimator p ^ pB- Then, we have that 

maxi?7v(p,p) > / d/i(p)i?jv(p,/5) 
pes J 

> / d//(p)i?Ar(p,/5B) = maxi?Ar(p, pb)- (C.2) 
J pes 

The first inequality is simply a statement that the maximum is greater then the 
mean; the second inequality follows from the fact that pB is a Bayes estimator. 



Equation (C.2) says precisely that pB is minimax. If pB is also the unique Bayes 
estimator for dp, the second inequality is converted into a strict inequality (">"), 
and we have maxp^g Ri^{p, p) > maxpg^ _Rjv(Pi /5b)j which proves the uniqueness 
of pb as a minimax estimator. 
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